187 research outputs found
Generalization in Reinforcement Learning by Soft Data Augmentation
Extensive efforts have been made to improve the generalization ability of
Reinforcement Learning (RL) methods via domain randomization and data
augmentation. However, as more factors of variation are introduced during
training, optimization becomes increasingly challenging, and empirically may
result in lower sample efficiency and unstable training. Instead of learning
policies directly from augmented data, we propose SOft Data Augmentation
(SODA), a method that decouples augmentation from policy learning.
Specifically, SODA imposes a soft constraint on the encoder that aims to
maximize the mutual information between latent representations of augmented and
non-augmented data, while the RL optimization process uses strictly
non-augmented data. Empirical evaluations are performed on diverse tasks from
DeepMind Control suite as well as a robotic manipulation task, and we find SODA
to significantly advance sample efficiency, generalization, and stability in
training over state-of-the-art vision-based RL methods.Comment: Website: https://nicklashansen.github.io/SODA/ Code:
https://github.com/nicklashansen/dmcontrol-generalization-benchmark.
Presented at International Conference on Robotics and Automation (ICRA) 202
TD-MPC2: Scalable, Robust World Models for Continuous Control
TD-MPC is a model-based reinforcement learning (RL) algorithm that performs
local trajectory optimization in the latent space of a learned implicit
(decoder-free) world model. In this work, we present TD-MPC2: a series of
improvements upon the TD-MPC algorithm. We demonstrate that TD-MPC2 improves
significantly over baselines across 104 online RL tasks spanning 4 diverse task
domains, achieving consistently strong results with a single set of
hyperparameters. We further show that agent capabilities increase with model
and data size, and successfully train a single 317M parameter agent to perform
80 tasks across multiple task domains, embodiments, and action spaces. We
conclude with an account of lessons, opportunities, and risks associated with
large TD-MPC2 agents. Explore videos, models, data, code, and more at
https://nicklashansen.github.io/td-mpc2Comment: Explore videos, models, data, code, and more at
https://nicklashansen.github.io/td-mpc
MoDem-V2: Visuo-Motor World Models for Real-World Robot Manipulation
Robotic systems that aspire to operate in uninstrumented real-world
environments must perceive the world directly via onboard sensing. Vision-based
learning systems aim to eliminate the need for environment instrumentation by
building an implicit understanding of the world based on raw pixels, but
navigating the contact-rich high-dimensional search space from solely sparse
visual reward signals significantly exacerbates the challenge of exploration.
The applicability of such systems is thus typically restricted to simulated or
heavily engineered environments since agent exploration in the real-world
without the guidance of explicit state estimation and dense rewards can lead to
unsafe behavior and safety faults that are catastrophic. In this study, we
isolate the root causes behind these limitations to develop a system, called
MoDem-V2, capable of learning contact-rich manipulation directly in the
uninstrumented real world. Building on the latest algorithmic advancements in
model-based reinforcement learning (MBRL), demo-bootstrapping, and effective
exploration, MoDem-V2 can acquire contact-rich dexterous manipulation skills
directly in the real world. We identify key ingredients for leveraging
demonstrations in model learning while respecting real-world safety
considerations -- exploration centering, agency handover, and actor-critic
ensembles. We empirically demonstrate the contribution of these ingredients in
four complex visuo-motor manipulation problems in both simulation and the real
world. To the best of our knowledge, our work presents the first successful
system for demonstration-augmented visual MBRL trained directly in the real
world. Visit https://sites.google.com/view/modem-v2 for videos and more
details.Comment: 9 pages, 8 figure
Finetuning Offline World Models in the Real World
Reinforcement Learning (RL) is notoriously data-inefficient, which makes
training on a real robot difficult. While model-based RL algorithms (world
models) improve data-efficiency to some extent, they still require hours or
days of interaction to learn skills. Recently, offline RL has been proposed as
a framework for training RL policies on pre-existing datasets without any
online interaction. However, constraining an algorithm to a fixed dataset
induces a state-action distribution shift between training and inference, and
limits its applicability to new tasks. In this work, we seek to get the best of
both worlds: we consider the problem of pretraining a world model with offline
data collected on a real robot, and then finetuning the model on online data
collected by planning with the learned model. To mitigate extrapolation errors
during online interaction, we propose to regularize the planner at test-time by
balancing estimated returns and (epistemic) model uncertainty. We evaluate our
method on a variety of visuo-motor control tasks in simulation and on a real
robot, and find that our method enables few-shot finetuning to seen and unseen
tasks even when offline data is limited. Videos, code, and data are available
at https://yunhaifeng.com/FOWM .Comment: CoRL 2023 Oral; Project website: https://yunhaifeng.com/FOW
Learning Vision-Guided Quadrupedal Locomotion End-to-End with Cross-Modal Transformers
We propose to address quadrupedal locomotion tasks using Reinforcement
Learning (RL) with a Transformer-based model that learns to combine
proprioceptive information and high-dimensional depth sensor inputs. While
learning-based locomotion has made great advances using RL, most methods still
rely on domain randomization for training blind agents that generalize to
challenging terrains. Our key insight is that proprioceptive states only offer
contact measurements for immediate reaction, whereas an agent equipped with
visual sensory observations can learn to proactively maneuver environments with
obstacles and uneven terrain by anticipating changes in the environment many
steps ahead. In this paper, we introduce LocoTransformer, an end-to-end RL
method for quadrupedal locomotion that leverages a Transformer-based model for
fusing proprioceptive states and visual observations. We evaluate our method in
challenging simulated environments with different obstacles and uneven terrain.
We show that our method obtains significant improvements over policies with
only proprioceptive state inputs, and that Transformer-based models further
improve generalization across environments. Our project page with videos is at
https://RchalYang.github.io/LocoTransformer .Comment: Our project page with videos is at
https://RchalYang.github.io/LocoTransforme
Self-supervised policy adaptation during deployment
In most real world scenarios, a policy trained by reinforcement learning in one environment needs to be deployed in another, potentially quite different environment. However, generalization across different environments is known to be hard. A natural solution would be to keep training after deployment in the new environment, but this cannot be done if the new environment offers no reward signal. Our work explores the use of self-supervision to allow the policy to continue training after deployment without using any rewards. While previous methods explicitly anticipate changes in the new environment, we assume no prior knowledge of those changes yet still obtain significant improvements. Empirical evaluations are performed on diverse simulation environments from DeepMind Control suite and ViZDoom, as well as real robotic manipulation tasks in continuously changing environments, taking observations from an uncalibrated camera. Our method improves generalization in 31 out of 36 environments across various tasks and outperforms domain randomization on a majority of environments. Webpage and implementation: https://nicklashansen.github.io/PAD/.Peer ReviewedPostprint (published version
Electrical energy by electrode placement for cardioversion of atrial fibrillation: a systematic review and meta-analysis
OBJECTIVE: Electrode patch position may not be critical for success when cardioverting atrial fibrillation (AF), but the relevance of applied electrical energy is unclarified. Our objective was to perform a meta-analysis of randomised trials to examine the dose-response relation between energy level and cardioversion success by electrode position in elective cardioversion.METHODS: We searched PubMed, Embase, The Cochrane Library, Google Scholar and Scopus Citations. Inclusion criteria were randomised controlled trials using biphasic shock waves and self-adhesive patches, and publication date from 2000 to 2023. We used random-effects dose-response models to meta-analyse the relation between energy level and cardioversion success by anterolateral and anteroposterior position. Random-effects models estimated pooled risk ratios (RR) for cardioversion success after the first and the final shocks between the two electrode positions.RESULTS: We included five randomised controlled trials (N=1078). After the first low-energy shock, the electrode position was not significantly associated with the likelihood of successful cardioversion (pooled RR anterolateral vs anteroposterior placement 1.28, 95% CI 0.93 to 1.76, with considerable heterogeneity). After a high-energy final shock, there was no evidence of an association between the electrode position and the cumulative chance of cardioversion success (pooled RR anterolateral vs anteroposterior 1.05, 95% CI 0.97 to 1.14). Regardless of electrode position, cardioversion success was significantly less likely with shock energy levels < 200J compared with 200J.CONCLUSION: Evidence from contemporary randomised trials suggests that higher level of electrical energy is associated with higher conversion rate when cardioverting AF with a biphasic shockwave. Positioning of electrodes can be based on convenience.</p
GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields
It is a long-standing problem in robotics to develop agents capable of
executing diverse manipulation tasks from visual observations in unstructured
real-world environments. To achieve this goal, the robot needs to have a
comprehensive understanding of the 3D structure and semantics of the scene. In
this work, we present , a visual behavior cloning agent for
multi-task robotic manipulation with eneralizable eural
feature ields. GNFactor jointly optimizes a generalizable neural
field (GNF) as a reconstruction module and a Perceiver Transformer as a
decision-making module, leveraging a shared deep 3D voxel representation. To
incorporate semantics in 3D, the reconstruction module utilizes a
vision-language foundation model (, Stable Diffusion) to distill
rich semantic information into the deep 3D voxel. We evaluate GNFactor on 3
real robot tasks and perform detailed ablations on 10 RLBench tasks with a
limited number of demonstrations. We observe a substantial improvement of
GNFactor over current state-of-the-art methods in seen and unseen tasks,
demonstrating the strong generalization ability of GNFactor. Our project
website is https://yanjieze.com/GNFactor/ .Comment: CoRL 2023 Oral. Website: https://yanjieze.com/GNFactor
- …